Distributed Job Allocation for Large-Scale Manycores
نویسندگان
چکیده
Contemporary operating systems heavily rely on single system images with shared memory constructs that may not scale well to large core counts. We consider the challenge of distributed job allocation, where each job is comprised of a set of tasks to be mapped to disjoint cores. A naive solution performing fragmented allocations may quickly escalate to deadlocks, where jobs hold and wait for cores in circular dependencies. To tackle these challenges, we propose a deadlock free distributed job allocation protocol. We have devised two policies for avoiding deadlocks, namely active cancellation and sequencer-based atomic broadcast. The protocol and the two policies have been implemented and evaluated on a Tilera TilePro64 processor with 64 cores on a single socket. Results show sparse job allocations to incur lower overhead for active cancellation while sequencer-based atomic broadcast has less overhead for denser allocations.
منابع مشابه
Exploring Distributed Resource Allocation Techniques in the SLURM Job Management System
With the exponentially growth of distributed computing systems in both flops and cores, scientific applications are growing more diverse with a variety of workloads. These workloads include traditional large-scale High Performance Computing MPI jobs, and ensemble workloads, such as Many-Task Computing workloads comprised of extremely large number of tasks of finer granularity, where tasks are d...
متن کاملJob Allocation for Large - Scale Many - cores
RAMACHANDRAN, SUBRAMANIAN. Distributed Job Allocation for Large-Scale Many-cores. (Under the direction of Dr. Frank Mueller.) As today’s manycore processors already feature over 64 cores and as tomorrow’s are slated to contain 1000s, it is important to design operating system techniques that can efficiently cope with this scale of resource coordination. The current state-of-the-art in manycore ...
متن کاملOptimized Contract-based Model for Resource Allocation in Federated Geo-distributed Clouds
In the era of Big Data, with data growing massively in scale and velocity, cloud computing and its pay-as-you-go model continues to provide significant cost benefits and a seamless service delivery model for cloud consumers. The evolution of small-scale and large-scale geo-distributed datacenters operated and managed by individual Cloud Service Providers (CSPs) raises new challenges in terms of...
متن کاملJob Admission and Resource Allocation in Distributed Streaming Systems
This paper describes a new and novel scheme for job admission and resource allocation employed by the SODA scheduler in System S . Capable of processing enormous quantities of streaming data, System S is a large-scale, distributed stream processing system designed to handle complex applications. The problem of scheduling in distributed, stream-based systems is quite unlike that in more traditio...
متن کاملMultiagent coordination for Multiple Resource Job Scheduling
Efficient management of large-scale job processing systems is a challenging problem, particularly in the presence of multiusers and dynamically changing system conditions. In addition, many real world systems require the processing of multi-resource jobs where centralized coordination may be difficult. Most conventional algorithms, such as load balancing, are designed for centralized, single re...
متن کامل